Skip to content

feat(init): add init command for guided Sentry project setup#283

Open
betegon wants to merge 49 commits intomainfrom
feat/init-command
Open

feat(init): add init command for guided Sentry project setup#283
betegon wants to merge 49 commits intomainfrom
feat/init-command

Conversation

@betegon
Copy link
Member

@betegon betegon commented Feb 23, 2026

Summary

Adds sentry init — an AI-powered wizard that walks users through adding Sentry to their project. It detects the platform, installs the SDK, instruments the code, and configures error monitoring, tracing, and session replay.

Changes

  • New init command backed by a Mastra AI workflow (hosted at getsentry/cli-init-api) that handles platform detection, SDK installation, and code instrumentation
  • ASCII banner, AI transparency note, and review reminder in the wizard UX
  • Tracing: unique trace IDs per wizard run with flattened span hierarchy
  • Python platforms use venv for isolated dependency installation
  • Command execution guardrails: shell metacharacter blocking, dangerous executable blocklist, path-traversal prevention
  • Magic values extracted into named constants (constants.ts)
  • Docs page added to cli.sentry.dev
  • Eval test suite (test/init-eval/) — see below

Eval suite

The eval suite validates that the wizard produces correct, buildable Sentry instrumentation for each supported platform. It uses a 3-phase test architecture:

Phase 1: Wizard run

Each test scaffolds a fresh project from a platform template, then runs the full sentry init wizard against it. The wizard output (exit code, stdout/stderr, git diff, new files) is captured for the next phases.

Phase 2: Hard assertions (deterministic)

Five code-based pass/fail checks that run without any LLM:

  1. exit-code — wizard exits 0
  2. sdk-installed — the Sentry SDK package appears in the dependency file (package.json / requirements.txt)
  3. init-presentSentry.init (or sentry_sdk.init) appears in changed or new files
  4. no-placeholder-dsn — no leftover placeholder DSNs (___PUBLIC_DSN___, YOUR_DSN_HERE, etc.)
  5. build-succeedsnpm run build / equivalent passes after the wizard's changes

Phase 3: LLM judge (per-feature)

For each feature (errors, tracing, replay, logs, profiling, etc.), an LLM judge scores correctness:

  • Official Sentry docs are fetched as ground truth (URLs mapped in feature-docs.json)
  • GPT-4o evaluates the wizard's diff + new files against the docs on 4 criteria: feature-initialized, correct-imports, no-syntax-errors, follows-docs
  • Each criterion is scored pass/fail/unknown; the overall feature score must be >= 0.5

Platforms

6 platform templates are covered:

Platform Template SDK
Express express/ @sentry/node
Next.js nextjs/ @sentry/nextjs
SvelteKit sveltekit/ @sentry/sveltekit
React + Vite react-vite/ @sentry/react
Flask python-flask/ sentry-sdk
FastAPI python-fastapi/ sentry-sdk

Running

bun run test:init-eval          # all platforms

Requires SENTRY_AUTH_TOKEN, SENTRY_ORG, SENTRY_PROJECT, and optionally OPENAI_API_KEY (LLM judge is skipped without it).

Test Plan

  • bun run test:init-eval passes for all 6 platforms
  • bun run lint and bun run typecheck pass
  • CI passes (unit tests, e2e, lint, typecheck, build)
  • CI workflow for eval is tracked separately in Run init evals on CI #290

🤖 Generated with Claude Code

betegon and others added 21 commits February 17, 2026 20:48
Adds `sentry init` wizard that walks users through project setup via
the Mastra API, handling DSN configuration, SDK installation prompts,
and local file operations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sends tags and metadata (CLI version, OS, arch, node version) with
startAsync and resumeAsync calls so workflow runs are visible and
filterable in Mastra Studio.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Import randomBytes and generate a hex trace ID so all
suspend/resume calls within a single wizard run share one trace.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a synthetic parentSpanId to tracingOptions so all workflow run
spans become siblings under the same parent instead of nesting by
timestamp containment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The parentSpanId was creating artificial nesting - let the workflow
engine handle span hierarchy naturally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Display the branded SENTRY ASCII banner before the intro line for visual
consistency with `sentry --help`. Make the "errors" feature always enabled
in the feature multi-select so users cannot deselect error monitoring.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pt, and source maps hint

Route success-with-exitCode results to formatError so the --force hint
is shown when Sentry is already installed. Fold the "Error Monitoring is
always included" note into the multiselect prompt. Use a more approachable
Source Maps hint.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Show a non-blocking info note about AI usage with a docs link before
the first network call, and a review reminder before the success outro.
Extract SENTRY_DOCS_URL constant to share between wizard-runner and
clack-utils cancel message.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add @anthropic-ai/sdk and openai as devDependencies for the LLM-as-judge
eval framework. Add opencode-lore dependency. Exclude test/init-eval/templates
from biome linting since they are fixture apps, not source code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add LLM-as-judge eval tests for the init wizard across all five
platforms (Express, Next.js, Flask, React+Vite, SvelteKit). Each test
runs the wizard end-to-end and asserts on SDK installation, Sentry.init
presence, build success, and documentation accuracy via an LLM judge.

Includes template apps, helper utilities (assertions, doc-fetcher,
judge, platform configs), and feature-docs.json mapping.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a separate workflow for running init-eval tests on demand. Supports
running a single platform or all platforms via matrix. Uses the init-eval
GitHub environment for MASTRA_API_URL and OPENAI_API_KEY secrets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Store python-fastapi doc URLs as base paths (with trailing slash) like
other platforms, and convert to .md at fetch time. This mirrors the
pattern in cli-init-api and lets us return clean markdown directly
instead of stripping HTML tags.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Sentry doc URLs for python-flask (getting-started, errors, tracing,
logs, profiling) and add the shared python/profiling page to both flask
and fastapi profiling entries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Sentry doc URLs for all nextjs features: getting-started, errors,
logs, tracing, session replay, metrics, and profiling (browser + node).
Sourcemaps left empty for now.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Sentry doc URLs for sveltekit features and add missing logs,
metrics, and profiling features to the platform entry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Sentry doc URLs for react-vite features and add missing logs,
metrics, and profiling features to the platform entry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Flask eval was using bare `pip install` which fails when pip isn't on
PATH. Use the same venv pattern as fastapi. Also remove accidental
opencode-lore runtime dependency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 23, 2026

Semver Impact of This PR

🟡 Minor (new features)

📋 Changelog Preview

This is how your changes will appear in the changelog.
Entries from this PR are highlighted with a left border (blockquote style).


New Features ✨

  • (formatters) Render all terminal output as markdown by BYK in #297
  • (init) Add init command for guided Sentry project setup by betegon in #283
  • (issue-list) Global limit with fair distribution, compound cursor, and richer progress by BYK in #306

Bug Fixes 🐛

Api

  • Use numeric project ID to avoid "not actively selected" error by betegon in #312
  • Use limit param for issues endpoint page size by BYK in #309
  • Auto-correct ':' to '=' in --field values with a warning by BYK in #302

Formatters

  • Expand streaming table to fill terminal width by betegon in #314
  • Fix HTML entities and escaped underscores in table output by betegon in #313

Other

  • (ci) Generate JUnit XML to silence codecov-action warnings by BYK in #300
  • (nightly) Push to GHCR from artifacts dir so layer titles are bare filenames by BYK in #301
  • (test) Handle 0/-0 in getComparator anti-symmetry property test by BYK in #308

Internal Changes 🔧

  • (api) Wire listIssuesPaginated through @sentry/api SDK for type safety by BYK in #310

🤖 This preview updates automatically when you update the PR.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 23, 2026

Codecov Results 📊

45 passed | Total: 45 | Pass Rate: 100% | Execution Time: 0ms

📊 Comparison with Base Branch

Metric Change
Total Tests 📉 -2246
Passed Tests 📉 -2246
Failed Tests
Skipped Tests

All tests are passing successfully.

✅ Patch coverage is 94.97%. Project has 3854 uncovered lines.
❌ Project coverage is 78.89%. Comparing base (base) to head (head).

Files with missing lines (4)
File Patch % Lines
wizard-runner.ts 82.59% ⚠️ 35 Missing
app.ts 81.74% ⚠️ 21 Missing
local-ops.ts 97.49% ⚠️ 8 Missing
help.ts 97.39% ⚠️ 3 Missing
Coverage diff
@@            Coverage Diff             @@
##          main       #PR       +/-##
==========================================
- Coverage    80.14%    78.89%    -1.25%
==========================================
  Files          120       127        +7
  Lines        16316     18254     +1938
  Branches         0         0         —
==========================================
+ Hits         13075     14400     +1325
- Misses        3241      3854      +613
- Partials         0         0         —

Generated by Codecov Action

betegon and others added 3 commits February 23, 2026 22:16
Restrict GITHUB_TOKEN to contents:read as flagged by CodeQL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update SvelteKit template with working deps (adapter-node, latest
svelte/vite) and add required src files (app.d.ts, app.html). Use
python3 instead of python for venv creation in Flask/FastAPI platforms.
Add --concurrency 6 to init-eval test runner.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add push/pull_request triggers so the eval runs automatically alongside
other CI checks. Keep workflow_dispatch for manual single-platform runs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
betegon and others added 4 commits February 26, 2026 10:53
Move hardcoded numeric values, string literals, and exit codes into
constants.ts for better readability and maintainability across the
init module.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move regex to top-level constant (useTopLevelRegex)
- Remove unused template literal (noUnusedTemplateLiteral)
- Replace explicit `return undefined` with bare `return` (noUselessUndefined)
- Apply formatter to both source and test files

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add tests for local-ops (FS operations, command execution, patchset
application), formatters (result/error display), help (banner/custom
help output), interactive prompts (select/multiselect/confirm), and
wizard-runner (TTY check, success/error paths, suspend/resume loop).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
betegon and others added 4 commits March 2, 2026 16:55
Keep both openai and marked dependencies, add test:init-eval script
back, and take main's version (0.14.0-dev.0) and restructured
package.json layout.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…coverage

Include test/isolated in the test:unit coverage run so that existing
comprehensive tests for wizard-runner and interactive modules count
toward patch coverage. Add new tests for init command parsing,
clack-utils utilities, cancel paths, and wizard-runner edge cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bun's mock.module() leaks between test files in the same run. Keep
test:unit and test:isolated as separate invocations, add coverage
flags to test:isolated, and merge lcov reports before upload.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@betegon betegon marked this pull request as ready for review March 2, 2026 18:07
The init-command test's mock.module() for interactive.js was poisoning
init-interactive.test.ts in the same bun test run. Moved to
test/commands/ with a single wizard-runner.js mock instead of 7
redundant mocks — no other test in test/commands/ depends on that module.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
betegon and others added 2 commits March 2, 2026 19:35
bun's mock.module() leaks across files when run in a single process.
Run each test/isolated/*.test.ts file in its own bun test invocation
to ensure true mock isolation, accumulating LCOV coverage for CI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Running each isolated test in its own bun process produces overlapping
coverage for shared source files. Concatenating these LCOV files created
duplicate SF entries that codecov counted as separate files, inflating
line counts and dropping project coverage from 80% to 51%.

Add script/merge-lcov.sh (awk) to deduplicate by source file, taking
the max hit count per line, so codecov sees 51 unique files instead of
133 duplicate entries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Convert init-interactive and init-wizard-runner tests from isolated
mock.module() pattern to spyOn() on namespace imports, eliminating
mock leakage without process isolation. Also fix CI coverage merge
to deduplicate LCOV entries via merge-lcov.sh.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
betegon and others added 2 commits March 2, 2026 20:33
spyOn on local TS module exports doesn't intercept in bun on Linux,
so wizard-runner must stay isolated with mock.module(). The interactive
test remains in test/lib/init/ since it only spies on @clack/prompts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wrap fd operations in try/finally so fs.closeSync is always called,
even if fs.readSync throws an I/O error.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
betegon and others added 3 commits March 2, 2026 20:51
…rm shell

Add >, <, and & to SHELL_METACHARACTER_PATTERNS to prevent redirection
(e.g. `npm install foo > /arbitrary/path`) and background execution
(e.g. `npm install foo & curl evil.com`) in validated commands.

Replace hardcoded `spawn("sh", ...)` with `spawn(command, [], { shell: true })`
so Node selects the platform-appropriate shell (cmd.exe on Windows).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The remote workflow controls payload.cwd, but handleLocalOp never
checked that it falls within options.directory. A misbehaving workflow
could set cwd:"/" to escape the path sandbox entirely. Now cwd is
validated at the top of handleLocalOp before any operation runs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
runWizard swallows errors and returns void, but most error paths were
missing process.exitCode = 1, causing failed initializations to exit
with code 0 in CI/CD pipelines.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Move createRun() into try/catch so network failures get graceful
  "Connection failed" message instead of an unhandled stack trace
- Block shell expansion characters ($, ', ", \) in validateCommand to
  prevent bypass via ANSI-C quoting, variable expansion, and escapes
- Remove unused stdout/stderr/stdin fields from WizardOptions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Comment on lines +177 to +183
const payload = extractSuspendPayload(result, stepId);
if (!payload) {
spin.stop("Error", 1);
log.error(`No suspend payload found for step "${stepId}"`);
cancel("Setup failed");
return;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: A specific error path in the setup wizard exits with code 0 (success) instead of 1 (failure) when a suspend payload is missing, causing silent failures.
Severity: MEDIUM

Suggested Fix

Before the return; statement on line 182 in the if (!payload) block, add process.exitCode = 1; to ensure the process exits with a failure code, consistent with other error-handling paths in the function.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/lib/init/wizard-runner.ts#L177-L183

Potential issue: In the `runWizard` function, if `extractSuspendPayload` returns an
undefined payload, the function logs an error and then executes an early `return` on
line 182. This specific error path fails to set `process.exitCode = 1` before exiting.
As a result, a failed setup wizard run will incorrectly report a success status (exit
code 0) to the calling process. This behavior is inconsistent with all other failure
paths within the same function, which correctly set the exit code to 1, and can cause
CI/CD pipelines or automation scripts to misinterpret a failure as a success.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

{ pattern: ">", label: "redirection (>)" },
{ pattern: "<", label: "redirection (<)" },
{ pattern: "&", label: "background execution (&)" },
];
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parentheses bypass shell metacharacter and executable blocklist

High Severity

SHELL_METACHARACTER_PATTERNS blocks $( but not standalone ( or ). Since runSingleCommand uses shell: true, a command like (rm -rf .) passes all validation: no blocked metacharacters match, and the first-token extraction yields (rm whose path.basename is "(rm" — not in BLOCKED_EXECUTABLES. The shell then interprets (...) as a subshell, executing the blocked executable. This defeats both the metacharacter check and the executable blocklist for any command wrapped in parentheses.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants